managed file events#228
Open
DoyleDev wants to merge 4 commits into
Open
Conversation
alexott
requested changes
Feb 6, 2026
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a new Terraform module and a corresponding example to provision AWS + Databricks Unity Catalog resources required to use Databricks Managed File Events (managed SQS + file-notification mode) with Auto Loader.
Changes:
- Introduces
modules/aws-managed-file-eventsto create/use an S3 bucket, create the UC IAM role/policy, and provision a storage credential + external location (optionally a catalog) with file events enabled. - Adds
examples/aws-managed-file-eventsdemonstrating how to call the module and configure providers/inputs. - Adds module/example documentation plus terraform-docs Makefile targets.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| modules/aws-managed-file-events/versions.tf | New module provider requirements (missing required_version). |
| modules/aws-managed-file-events/variables.tf | Module inputs/locals (includes unused required vars; missing conditional validations). |
| modules/aws-managed-file-events/s3.tf | Optional S3 bucket creation + encryption/public access block, or data source for existing bucket. |
| modules/aws-managed-file-events/iam.tf | IAM role/policy creation using Databricks UC policy/assume-role policy data sources. |
| modules/aws-managed-file-events/main.tf | Creates storage credential, external location with managed file events, and grants. |
| modules/aws-managed-file-events/catalog.tf | Optional catalog creation + grants (force-destroy wired to bucket flag). |
| modules/aws-managed-file-events/outputs.tf | Exposes bucket, IAM role, storage credential, external location, and optional catalog outputs. |
| modules/aws-managed-file-events/README.md | New module docs + usage snippets (contains a few copy/paste issues). |
| modules/aws-managed-file-events/Makefile | terraform-docs helper targets for the module. |
| examples/aws-managed-file-events/versions.tf | Example provider requirements (missing required_version). |
| examples/aws-managed-file-events/providers.tf | Example AWS + Databricks provider configuration (PAT var currently unused). |
| examples/aws-managed-file-events/variables.tf | Example input variables (includes unused/misdescribed PAT variable). |
| examples/aws-managed-file-events/main.tf | Invokes the new module. |
| examples/aws-managed-file-events/outputs.tf | Example outputs exposing module outputs. |
| examples/aws-managed-file-events/README.md | Example instructions + code snippets (some inaccuracies). |
| examples/aws-managed-file-events/terraform.tfvars | Sample tfvars (includes secret-looking placeholders). |
| examples/aws-managed-file-events/Makefile | terraform-docs helper targets for the example. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+1
to
+6
| terraform { | ||
| required_providers { | ||
| aws = { | ||
| source = "hashicorp/aws" | ||
| version = ">= 5.0" | ||
| } |
Comment on lines
+1
to
+5
| terraform { | ||
| required_providers { | ||
| aws = { | ||
| source = "hashicorp/aws" | ||
| version = ">= 5.0" |
Comment on lines
+12
to
+25
| variable "region" { | ||
| type = string | ||
| description = "(Required) AWS region where the assets will be deployed" | ||
| } | ||
|
|
||
| variable "aws_account_id" { | ||
| type = string | ||
| description = "(Required) AWS account ID where the IAM role will be created" | ||
| } | ||
|
|
||
| variable "databricks_account_id" { | ||
| type = string | ||
| description = "(Required) Databricks Account ID" | ||
| } |
| variable "existing_bucket_name" { | ||
| type = string | ||
| description = "(Optional) Name of existing S3 bucket when create_bucket is false" | ||
| default = null |
| variable "catalog_name" { | ||
| type = string | ||
| description = "(Optional) Name for the catalog. Required if create_catalog is true" | ||
| default = null |
Comment on lines
+103
to
+112
| Or in Lakeflow Declarative Pipelines: | ||
|
|
||
| ```python | ||
| @dlt.table | ||
| def my_table(): | ||
| return spark.readStream.format("cloudFiles") \ | ||
| .option("cloudFiles.format", "json") \ | ||
| .option("cloudFiles.useManagedFileEvents", "true") \ | ||
| .load("s3://bucket/path") | ||
| ``` |
Comment on lines
+48
to
+55
|
|
||
|
|
||
| variable "databricks_pat_token" { | ||
| type = string | ||
| sensitive = true | ||
| description = "(Required) Databricks service principal client secret" | ||
| } | ||
|
|
Comment on lines
+5
to
+9
| databricks_account_id = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" | ||
| databricks_host = "https://my-workspace.cloud.databricks.com" | ||
| databricks_pat_token = "dapixxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" | ||
| databricks_client_id = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" | ||
| databricks_client_secret = "dosexxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" |
Comment on lines
+94
to
+105
| Or in Lakeflow Declarative Pipelines: | ||
|
|
||
| ```python | ||
| from pyspark import pipelines as dp | ||
|
|
||
| @dp.table | ||
| def my_table(): | ||
| return spark.readStream.format("cloudFiles") \ | ||
| .option("cloudFiles.format", "json") \ | ||
| .option("cloudFiles.useManagedFileEvents", "true") \ | ||
| .load("/Volumes") # Ingesting from a volume that points to your S3 bucket will be more performant than the S3 location itself. | ||
| ``` |
| 2. Add a `variables.tf` with the same content in [variables.tf](variables.tf) | ||
| 3. Add a `terraform.tfvars` file and provide values to each defined variable | ||
| 4. Configure authentication to your Databricks workspace and AWS account | ||
| 5. Add a `output.tf` file |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Creating a module and example of how to create the necessary IAM role, policy, external location, and storage credential resources needed for managed file events.